Search CORE

20 research outputs found

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Raymond W.M. Ng
Mauro Nicolao
Thomas Hain
Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Fék
Ferrer
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Muthusamy
Navrátil
Ng
Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Phone tokenisers are used in spoken language recognition (SLR) to obtain elementary phonetic information. We present a study on the use of deep neural network tokenisers. Unsupervised crosslingual adaptation was performed to adapt the baseline tokeniser trained on English conversational telephone speech data to different languages. Two training and adaptation approaches, namely cross-entropy adaptation and state-level minimum Bayes risk adaptation, were tested in a bottleneck i-vector and a phonotactic SLR system. The SLR systems using the tokenisers adapted to different languages were combined using score fusion, giving 7-18% reduction in minimum detection cost function (minDCF) compared with the baseline configurations without adapted tokenisers. Analysis of results showed that the ensemble tokenisers gave diverse representation of phonemes, thus bringing complementary effects when SLR systems with different tokenisers were combined. SLR performance was also shown to be related to the quality of the adapted tokenisers

Crossref

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Ferrer
Fék
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Mauro Nicolao
Muthusamy
Navrátil
Ng
Ng
Raymond W.M. Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Thomas Hain
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Crossref

White Rose Research Online

Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Author: A Stolcke
A Zolnay
B Scholkopf
D Garcia-Romero
D Povey
E Singer
F Zheng
G Heigold
H Li
H Li
H Li
H Xiong
Jia Liu
KC Sim
LF Dharo Enriquez
LJ Rodriguez-Fuentes
M Penagarikano
M Wang
MA Zissman
Michael T Johnson
MP Lewis
N Dehak
N Morgan
P Matejka
P Matejka
P Matejka
P Schwarz
P Vincent
PA Torres-Carrasquillo
PA Torres-Carrasquillo
PA Torres-Carrasquillo
R Collobert
V Hubeika
VW Zue
W-Q Zhang
W-W Liu
W-W Liu
W-W Liu
Wei-Qiang Zhang
Wei-Wei Liu
WM Campbell
WM Campbell
WM Campbell
YK Muthusamy
Z Jancik
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Introduction

Author: F Ramus
J Laver
KS Rao
L Bauer
L Rabiner
MA Zissman
T Schultz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Literature Review

Author: AS House
C-H Wu
GR Botha
H Li
J-L Rouas
JL Rouas
K Sreenivasa Rao
KS Rao
L Mary
M-H Siu
MA Zissman
MA Zissman
S Eady
S Greenberg
SG Koolagudi
SK Chai
TJ Hazen
V Ramu Reddy
W Campbell
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Curriculum learning based approach for noise robust language identification using DNN with attention

Author: Ambikairajah
Bengio
Boll
Brij Mohan Lal
Castaldo
Dehak
Dehak
Ganapathy
Gonzalez-Dominguez
Gonzalez-Dominguez
Lei
Li
Lopez-Moreno
Ma
Maity
Martin
Martınez
Mary
Mounika
Muthusamy
Raj
Ranjan
Rao
Ravi Kumar
Ravi Kumar
Reddy
Rouas
Torres-Carrasquillo
Wang
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Multilingual Speech Recognition

Author: A Kurematsu
A Matsumura
F Metze
M Finke
M Finke
M Gales
MA Zissman
T Kemp
T Schaaf
T Schultz
T Schultz
Y Matsumoto
Y Muthusamy
Publication venue: Morgan Kaufmann
Publication date: 01/01/2000
Field of study

The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents different solutions to the problem of the automatic language identification task. The combination of the described components results in a flexible and user-friendly multilingual spoken dialog system

CiteSeerX

Crossref

KITopen

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Author: BH Juang
GE Dahl
H Li
H Li
Jia Liu
KC Sim
MA Zissman
Meng Cai
Michael T. Johnson
Wei-Qiang Zhang
Wei-Wei Liu
WM Wells
WQ Zhang
WQ Zhang
Y Deng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2016
Field of study

Currently, phonotactic spoken language recognition (SLR) and acoustic SLR systems are widely used language recognition systems. Parallel phone recognition followed by vector space modeling (PPRVSM) is one typical phonotactic system for spoken language recognition. To achieve better performance, researchers assumed to extract more complementary information of the training data using phone recognizers trained for multiple language-specific phone recognizers, different acoustic models and acoustic features. These methods achieve good performance but usually compute at high computational cost and only using complementary information of the training data. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting framework to use the discriminative information of test data effectively, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation comes from the emphasis on working with the high confidence test data to achieve discriminatively trained models. Our variant of boosting also includes utilizing original training data in VSM training. The discriminative boosting algorithm (DBA) is applied to the National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009 task and show performance improvements. The experimental results demonstrate that the proposed DBA shows 1.8 %, 11.72 % and 15.35 % relative reduction for 30s, 10s and 3s test utterances in equal error rate (EER) than baseline system

epublications@Marquette

Crossref

Current trends in multilingual speech processing

Author: DAVID IMSENG
F Valente
FABIO VALENTE
H Bourlard
H Zen
H Zen
HERVÉ BOURLARD
HUI LIANG
I Bulyko
J Dines
J Navratil
J Navratil
J Pinto
JOHN DINES
K Tokuda
L Lamel
L Lee
L Saheer
LAKSHMI SAHEER
LR Rabiner
M Gales
M Wester
MA Zissman
MA Zissman
MATHEW MAGIMAI-DOSS
N Morgan
PETR MOTLICEK
PHILIP N GARNER
R Schlüter
SP Khudanpur
T Schultz
T Schultz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Author: BH Juang
GE Dahl
H Li
H Li
Jia Liu
KC Sim
MA Zissman
Meng Cai
Michael T. Johnson
Wei-Qiang Zhang
Wei-Wei Liu
WM Wells
WQ Zhang
WQ Zhang
Y Deng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref